-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix slowdown after generating backslash escape sequences #91
Conversation
OK, now I think I understand the problem you reported and the solution you proposed. However I think your specific solution may create illegal outputs, as if the topmost parser still has potential characters that can be inserted, we are not necessarily returning the correct results. Better yet, this check should happen not inside the shortcut key, but during the jsonschemaparser itself - if a member in the stack is in can_end=True and get_allowed_characters()=empty, it shouldn't be there anymore because we know what will happen next time. I think this is the correct solution for this problem. |
Good point! I just implemented this.
That sounds very reasonable but I'm not sure exactly where in the lifecycle of For now, I implemented the former change. If you want to give more specific guidance I'm happy to try the latter. |
On 2nd thought, I'm not totally sure this is true. If the topmost parser's "can end" is true, then the JSONSchemaParser's Let me know if I'm missing something. |
get_allowed_characters() not of the JSONSchemaParser, but of a specific object in its stack. I've implemented it in this branch: can you check if it solves the performance issue you are encountering? |
Yeah, I got that. I guess I understand you original point now, in that if the topmost parser has some characters that are allowed but wouldn't be allowed by
Yes, it will take me a few minutes as I don't use |
You can also see the diff here: d5dcdc5 and apply it locally |
My port is in Swift so applying the patches takes some work :)
This change doesn't work for me; in my tests it strips I can't reproduce this in the Python implementation. I suspect the reason is because my version has a stricter whitespace policy, which means after reading a string my string parser has 0 allowed characters where this implementation has whitespace in its allowed characters. So, for my use case I think I need to go with the version that alters the But, of course I defer to you if you'd prefer to use an implementation like what you have in |
You're right, added handling of the StringParser in this commit: 34032f6 |
I merged the holistic solution together with the string parser fix to master. Thank you so much for spotting this issue, the unit test suite now runs twice as fast in the CI! This is a great find. I will be closing this PR soon. |
That's great - glad I could help! |
Addresses an issue I've seen which I believe is the same as what's reported in #90: an edge case where JSONSchemaParser's
shortcut_key
implementation did not consider the correct parser while finishing parsing an escape sequence. When theUnionParser
thatStringParsingState
pops onto the stack after getting aBACKSLASH
is finished but not yet removed from the stack,shortcut_key
fails to returnjson_freetext
.